College dataset

Description
Statistics for a large number of US Colleges from the 1995 issue of US News and World Report.
Dimensions : 777 x 18
Short description of variables (appendix)

Sources
This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University. The dataset was used in the ASA Statistical Graphics Section's 1995 Data Analysis Exposition.

References
This dataset is a part of the course material of the book : Introduction to Statistical Learning with R
(Ch 02 - Statistical Learning - Applied Exercises - Problem 8)

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Some preliminary workings

In [1]:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2.8a - Import data

In [2]:
  1. 777
  2. 19
A data.frame: 6 × 19
XPrivateAppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.Rate
<chr><chr><int><int><int><int><int><int><int><int><int><int><int><int><int><dbl><int><int><int>
1Abilene Christian UniversityYes1660123272123522885 537 744033004502200707818.112 704160
2Adelphi University Yes218619245121629268312271228064507501500293012.2161052756
3Adrian College Yes1428109733622501036 991125037504001165536612.930 873554
4Agnes Scott College Yes 417 3491376089 510 63129605450450 8759297 7.7371901659
5Alaska Pacific University Yes 193 146 551644 249 869 756041208001500767211.9 21092215
6Albertson College Yes 587 4791583862 678 41135003335500 6756773 9.411 972755
In [3]:
Check for missing values
In [4]:
0
In [5]:
In [6]:
In [7]:
'data.frame':	777 obs. of  19 variables:
 $ X          : chr  "Abilene Christian University" "Adelphi University" "Adrian College" "Agnes Scott College" ...
 $ Private    : chr  "Yes" "Yes" "Yes" "Yes" ...
 $ Apps       : int  1660 2186 1428 417 193 587 353 1899 1038 582 ...
 $ Accept     : int  1232 1924 1097 349 146 479 340 1720 839 498 ...
 $ Enroll     : int  721 512 336 137 55 158 103 489 227 172 ...
 $ Top10perc  : int  23 16 22 60 16 38 17 37 30 21 ...
 $ Top25perc  : int  52 29 50 89 44 62 45 68 63 44 ...
 $ F.Undergrad: int  2885 2683 1036 510 249 678 416 1594 973 799 ...
 $ P.Undergrad: int  537 1227 99 63 869 41 230 32 306 78 ...
 $ Outstate   : int  7440 12280 11250 12960 7560 13500 13290 13868 15595 10468 ...
 $ Room.Board : int  3300 6450 3750 5450 4120 3335 5720 4826 4400 3380 ...
 $ Books      : int  450 750 400 450 800 500 500 450 300 660 ...
 $ Personal   : int  2200 1500 1165 875 1500 675 1500 850 500 1800 ...
 $ PhD        : int  70 29 53 92 76 67 90 89 79 40 ...
 $ Terminal   : int  78 30 66 97 72 73 93 100 84 41 ...
 $ S.F.Ratio  : num  18.1 12.2 12.9 7.7 11.9 9.4 11.5 13.7 11.3 11.5 ...
 $ perc.alumni: int  12 16 30 37 2 11 26 37 23 15 ...
 $ Expend     : int  7041 10527 8735 19016 10922 9727 8861 11487 11644 8991 ...
 $ Grad.Rate  : int  60 56 54 59 15 55 63 73 80 52 ...
Preliminary observations:
- No missing values.
- Currently, college/universities' names form part of the dataset. They will be added as rownames and removed from the executable data.
- Categorical variable 'Private' is presently saved as character. It will be converted to factor.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2.8b - Data preparation

In [8]:
A data.frame: 3 × 18
PrivateAppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.Rate
<fct><int><int><int><int><int><int><int><int><int><int><int><int><int><dbl><int><int><int>
Adelphi UniversityYes 2186 1924 5121629 268312271228064507501500293012.2161052756
Franklin CollegeYes 804 632 2812972 840 681039040405251345547812.5371175160
Boston UniversityYes2019213007381045801497131131842068104751025808111.9161683672
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2.8c - Data exploration

2.8c.1 - Summary statistics

In [9]:
 Private        Apps           Accept          Enroll       Top10perc    
 No :212   Min.   :   81   Min.   :   72   Min.   :  35   Min.   : 1.00  
 Yes:565   1st Qu.:  776   1st Qu.:  604   1st Qu.: 242   1st Qu.:15.00  
           Median : 1558   Median : 1110   Median : 434   Median :23.00  
           Mean   : 3002   Mean   : 2019   Mean   : 780   Mean   :27.56  
           3rd Qu.: 3624   3rd Qu.: 2424   3rd Qu.: 902   3rd Qu.:35.00  
           Max.   :48094   Max.   :26330   Max.   :6392   Max.   :96.00  
   Top25perc      F.Undergrad     P.Undergrad         Outstate    
 Min.   :  9.0   Min.   :  139   Min.   :    1.0   Min.   : 2340  
 1st Qu.: 41.0   1st Qu.:  992   1st Qu.:   95.0   1st Qu.: 7320  
 Median : 54.0   Median : 1707   Median :  353.0   Median : 9990  
 Mean   : 55.8   Mean   : 3700   Mean   :  855.3   Mean   :10441  
 3rd Qu.: 69.0   3rd Qu.: 4005   3rd Qu.:  967.0   3rd Qu.:12925  
 Max.   :100.0   Max.   :31643   Max.   :21836.0   Max.   :21700  
   Room.Board       Books           Personal         PhD        
 Min.   :1780   Min.   :  96.0   Min.   : 250   Min.   :  8.00  
 1st Qu.:3597   1st Qu.: 470.0   1st Qu.: 850   1st Qu.: 62.00  
 Median :4200   Median : 500.0   Median :1200   Median : 75.00  
 Mean   :4358   Mean   : 549.4   Mean   :1341   Mean   : 72.66  
 3rd Qu.:5050   3rd Qu.: 600.0   3rd Qu.:1700   3rd Qu.: 85.00  
 Max.   :8124   Max.   :2340.0   Max.   :6800   Max.   :103.00  
    Terminal       S.F.Ratio      perc.alumni        Expend     
 Min.   : 24.0   Min.   : 2.50   Min.   : 0.00   Min.   : 3186  
 1st Qu.: 71.0   1st Qu.:11.50   1st Qu.:13.00   1st Qu.: 6751  
 Median : 82.0   Median :13.60   Median :21.00   Median : 8377  
 Mean   : 79.7   Mean   :14.09   Mean   :22.74   Mean   : 9660  
 3rd Qu.: 92.0   3rd Qu.:16.50   3rd Qu.:31.00   3rd Qu.:10830  
 Max.   :100.0   Max.   :39.80   Max.   :64.00   Max.   :56233  
   Grad.Rate     
 Min.   : 10.00  
 1st Qu.: 53.00  
 Median : 65.00  
 Mean   : 65.46  
 3rd Qu.: 78.00  
 Max.   :118.00  
In [10]:
A matrix: 2 × 1 of type chr
Private
No
Yes
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2.8c.2 - Scatterplot matrix

In [11]:
13
  1. 'Private'
  2. 'Apps'
  3. 'Accept'
  4. 'Enroll'
  5. 'Top10perc'
  6. 'Top25perc'
  7. 'Outstate'
  8. 'Room.Board'
  9. 'Personal'
  10. 'PhD'
  11. 'Terminal'
  12. 'Expend'
  13. 'Grad.Rate'
In [12]:
Tentative observations:
- Outstate fees for private colleges has a higher spread and overall magnitude as compared to non-private.
- Private colleges have a much higher 'Expend' (instructional expenditure per student)
- There is moderately positive relationship bet the colleges preferred by the Top10perc and the Outstate tuition charged
#'#####################################################################
Correlation
In [13]:
A data.frame: 17 × 17
AppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.Rate
<chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr>
Apps1 0.9430.847- - 0.814- - - --- - - - - -
Accept0.9431 0.912- - 0.874- - - --- - - - - -
Enroll0.8470.9121 - - 0.965- - - --- - - - - -
Top10perc- - - 1 0.892- - 0.562 - --- - - - 0.661 -
Top25perc- - - 0.8921 - - - - --- - - - - -
F.Undergrad0.8140.8740.965- - 1 0.571- - --- - - - - -
P.Undergrad- - - - - 0.5711 - - --- - - - - -
Outstate- - - 0.562- - - 1 0.654--- - -0.5550.5660.673 0.571
Room.Board- - - - - - - 0.654 1 --- - - - - -
Books- - - - - - - - - 1-- - - - - -
Personal- - - - - - - - - -1- - - - - -
PhD- - - - - - - - - --1 0.85- - - -
Terminal- - - - - - - - - --0.851 - - - -
S.F.Ratio- - - - - - - -0.555- --- - 1 - -0.584-
perc.alumni- - - - - - - 0.566 - --- - - 1 - -
Expend- - - 0.661- - - 0.673 - --- - -0.584- 1 -
Grad.Rate- - - - - - - 0.571 - --- - - - - 1
#'#####################################################################
Correlation plot
In [14]:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2.8c.3 - Boxplot

In [15]:
In [16]:
Observations:
- As we had seen in the pairs plot, out-of-state tuition charged by the private colleges is much more.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2.8c.4 - Elite

Elite >> universities with Top10perc > 50%

In [17]:
In [18]:
No
699
Yes
78
In [19]:
78
65
In [20]:
A data.frame: 3 × 19
PrivateAppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.RateElite
<fct><int><int><int><int><int><int><int><int><int><int><int><int><int><dbl><int><int><int><fct>
Illinois Wesleyan UniversityYes3050134247155861818 23143604090 400 650779212.934 960583Yes
Wesleyan UniversityYes4772197371260862714 2719130560014001400909412.1391626292Yes
University of ScrantonYes4471294291029603674493115845986 650 800838314.141 913192No
In [21]:
Observations:
- 83% (65 of 78) of the 'Elite' institutions are private.
- The distribution of Outstate tuition in Elite universities is heavily right-skewed indicating that most of the 'Elite' institutions charge high out-of-state tuition.
- The median Outstate tuition in Elite institutions is much higher than in Non-elite instituition, pointing to a clear difference between the educational accessibility for out-of-state students.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2.8c.5 - Histograms

Freedman-Diaconis method has been used for calculating bin-widths.

In [22]:
In [23]:
'a) Student expenditure related variables'
Observations:
- Out-of-state tuition and Room.Board expenses are slightly positively skewed.
- Expenditure on 'Books', 'Personal' expenses of students and 'Expend' (instructional exp per student) are positively skewed.
- Median Total Expenditure with Outstate tuition is $16,079.
 Median household income in the same year (1995) as per the US Census was ≈ $34,000.
In [24]:
In [25]:
A matrix: 2 × 7 of type dbl
OutstateRoom.BoardBooksPersonalTExp.without.OTExp.with.OExpend
skew 0.5073 0.4755 3.47161.73580.5652 0.4549 3.4460
kurtosis-0.4255-0.201328.06337.04460.6157-0.382518.5875
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
In [26]:
'b) Faculty and student related ratios'
Observations:
- PhD and Terminal are hevily left-skewed, i.e. most of the faculty is highly specialised in their respective disciplines.
  PhD has one bin > 100. This could be a mistake.
- Not many colleges have student-faculty ratio > 20.
- There is wide fluctuation in Graduation rate with 17.63% of institutions having graduation rates below 50%.
- IQR (Q3-Q1) of alumnis who donate ranges from 13% to 31%.

Note: see workings below for calculations

'#################### workings

PhD > 100
In [27]:
A matrix: 1 × 2 of type chr
Texas A&M University at Galveston103
Graduation rate
In [28]:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10.00   53.00   65.00   65.46   78.00  118.00 
137
In [29]:
17.63
Donor alumnis
In [30]:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00   13.00   21.00   22.74   31.00   64.00 

'#################### workings'

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2.8c.6 - Further data exploration

a) Spending patterns - private vs non-private
In [31]:
A data.frame: 2 × 5
PrivateCountPropMedian.Personal>1649
<fct><int><table><dbl><int>
No 2120.2731649106
Yes5650.7271100103
In [32]:
In [33]:
Observations:
- Distribution in private is highly positively skewed while distribution in non-private is moderately positively skewed.
- Median personal spending by students in non-private ($1649) is higher than in private ($1100).
- The number of institutions where Personal spending is > $1649 (median(non-private)) is almost similar for both private (103) and non-private (106).
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
b) Most sought after college/university

Institutions with high

  • Top10perc
  • Top25perc
  • Apps
  • No. of Apps per Enroll
  • No. of Enroll per Accept
In [34]:
'data.frame':	777 obs. of  9 variables:
 $ Apps       : int  1660 2186 1428 417 193 587 353 1899 1038 582 ...
 $ Accept     : int  1232 1924 1097 349 146 479 340 1720 839 498 ...
 $ Enroll     : int  721 512 336 137 55 158 103 489 227 172 ...
 $ Top10perc  : int  23 16 22 60 16 38 17 37 30 21 ...
 $ Top25perc  : int  52 29 50 89 44 62 45 68 63 44 ...
 $ F.Undergrad: int  2885 2683 1036 510 249 678 416 1594 973 799 ...
 $ apr        : num  230 427 425 304 351 372 343 388 457 338 ...
 $ acr        : num  74 88 77 84 76 82 96 91 81 86 ...
 $ enr        : num  59 27 31 39 38 33 30 28 27 35 ...
In [35]:
  1. 100
  2. 9
A data.frame: 10 × 9
AppsAcceptEnrollTop10percTop25percF.Undergradapracrenr
<int><int><int><int><int><int><dbl><dbl><dbl>
Massachusetts Institute of Technology 64112140107896 99 4481 5953350
Harvey Mudd College 1377 572 17895100 654 7744231
University of California at Berkeley19873825232159510019532 6184239
Yale University107052453131795 99 5217 8132354
Duke University137893893158390 98 6188 8712841
Harvard University138652165160690100 6862 8631674
Princeton University132182042115390 98 454011461556
Georgia Institute of Technology 78374527227689 99 8528 3445850
Brown University125863239146287 95 5643 8612645
Dartmouth College 85872273108787 99 3918 7902648
In [36]:
A data.frame: 10 × 9
AppsAcceptEnrollTop10percTop25percF.Undergradapracrenr
<int><int><int><int><int><int><dbl><dbl><dbl>
Harvey Mudd College 1377 572 17895100 6547744231
University of California at Berkeley19873 8252321595100195326184239
Harvard University13865 2165160690100 68628631674
University of California at Irvine1569810775247885100126776336923
University of Pennsylvania12394 5232246485100 92055034247
Bowdoin College 3356 1019 41876100 14908033041
SUNY at Buffalo15039 9649308736100139634876432
Massachusetts Institute of Technology 6411 2140107896 99 44815953350
Yale University10705 2453131795 99 52178132354
Georgia Institute of Technology 7837 4527227689 99 85283445850
In [37]:
A data.frame: 5 × 9
AppsAcceptEnrollTop10percTop25percF.Undergradapracrenr
<int><int><int><int><int><int><dbl><dbl><dbl>
Rutgers at New Brunswick4809426330452036 792140110645517
Purdue University at West Lafayette2180418744587429 6026213 3718631
Boston University2019213007381045 8014971 5306429
University of California at Berkeley19873 825232159510019532 6184239
Pennsylvania State Univ. Main Campus1931510344345048 9328938 5605433
In [38]:
A data.frame: 5 × 9
AppsAcceptEnrollTop10percTop25percF.Undergradapracrenr
<int><int><int><int><int><int><dbl><dbl><dbl>
Rutgers State University at Camden336617522322779258514515213
Talladega College441415003353060 90813183422
SUNY College at New Paltz839936096561953465812804318
Franklin Pierce College51874471446 314181811638610
Rutgers State University at Newark578526904992662400511594619
In [39]:
A data.frame: 5 × 9
AppsAcceptEnrollTop10percTop25percF.Undergradapracrenr
<int><int><int><int><int><int><dbl><dbl><dbl>
California Lutheran University 563 247 2472352142722844100
Brewton-Parker College1436122812021026132011986 98
Mississippi University for Women 480 405 3801946167312684 94
Peru State College 701 501 4581040 95915371 91
Indiana Wesleyan University 735 423 3662048244820158 87
In [40]:
16
Most Sought-after Colleges/Universities (Final list)
In [41]:
A data.frame: 16 × 9
AppsAcceptEnrollTop10percTop25percF.Undergradapracrenr
<int><int><int><int><int><int><dbl><dbl><dbl>
Massachusetts Institute of Technology 64112140107896 99 4481 5953350
Yale University107052453131795 99 5217 8132354
Harvard University138652165160690100 6862 8631674
Princeton University132182042115390 98 454011461556
Brown University125863239146287 95 5643 8612645
Dartmouth College 85872273108787 99 3918 7902648
University of Pennsylvania123945232246485100 9205 5034247
Amherst College 4302 992 41883 96 159310292342
Wellesley College 28951249 57980 96 2195 5004346
University of Notre Dame 77003700190679 96 7671 4044852
Columbia University 67561930 87178 96 3376 7762945
Davidson College 2373 956 45277 96 1601 5254047
University of North Carolina at Chapel Hill145965985333175 9214609 4384156
University of Virginia158495384267874 9511278 5923450
Georgetown University111152881139071 93 5881 8002648
Grove City College 24911110 57357 88 2213 4354552
#'################ workings
Function to show customized percentiles
In [42]:
In [43]:
A data.frame: 10 × 9
AppsAcceptEnrollTop10percTop25percF.Undergradapracrenr
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
Min 81 72 35 1 9 139 119 15 10
5% 330 272 119 7 26 510 186 45 24
10% 458 362 15410 31 641 210 54 26
25% 776 604 24215 41 992 261 68 32
50% 1558 1110 43423 54 1707 343 78 39
Mean 3002 2019 78028 56 3700 382 75 41
75% 3624 2424 90235 69 4005 454 85 49
90% 7675 4814190450 8510024 590 90 59
95%11066 6979275765 9314478 705 93 67
Max4809426330639296100316431451100100

'################ workings'

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
c) Further analysis of Most sought-after colleges/univeristies
In [44]:
 No Yes 
  2  14 
In [45]:
A data.frame: 10 × 19
AppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.RateTExp.without.OTExp.with.O
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
Min 81 72 35 1 9 139 1 23401780 96 250 8 24 2 0 3186 10 3452 6604
5% 330 272 119 7 26 510 20 46022736 350 500 44 53 8 6 4796 37 4450 9847
10% 458 362 15410 31 641 35 55693051 400 600 51 5910 8 5558 45 480911006
25% 776 604 24215 41 992 95 73203597 470 850 62 711213 6751 53 540013279
50% 1558 1110 43423 54 1707 353 99904200 5001200 75 821421 8377 65 610016079
Mean 3002 2019 78028 56 3700 855104414358 5491341 73 801423 9660 65 624816688
75% 3624 2424 90235 69 4005 967129255050 6001700 85 92163110830 78 695819650
90% 7675 4814190450 8510024 2017165535950 7002200 92 96194014841 89 792223430
95%11066 6979275765 9314478 3304184986382 7662489 95 98214617975 94 839226030
Max48094263306392961003164321836217008124234068001031004064562331181233029095
A data.frame: 16 × 21
PrivateAppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.RateEliteTExp.without.OTExp.with.O
<fct><int><int><int><int><int><int><int><int><int><int><int><int><int><dbl><int><int><int><fct><int><int>
Massachusetts Institute of TechnologyYes 64112140107896 99 4481 28201005975725160099 9910.13533541 94Yes830028400
Yale UniversityYes107052453131795 99 5217 83198406510630211596 96 5.84940386 99Yes925529095
Harvard UniversityYes138652165160690100 6862 320184856410500192097 97 9.95237219100Yes883027315
Princeton UniversityYes132182042115390 98 4540 146199005910675157591 96 8.45428320 99Yes816028060
Brown UniversityYes125863239146287 95 5643 349195285926720110099100 7.63920440 97Yes774627274
Dartmouth CollegeYes 85872273108787 99 3918 32195456070550110095 99 4.74929619 98Yes772027265
University of PennsylvaniaYes123945232246485100 9205 531170207270500154495 96 6.33825765 93Yes931426334
Amherst CollegeYes 4302 992 41883 96 1593 5197605300660159893 98 8.46321424100Yes755827318
Wellesley CollegeYes 28951249 57980 96 2195 156183455995500 70094 9810.65121409 91Yes719525540
University of Notre DameYes 77003700190679 96 7671 30168504400600135096 9213.14613936 97Yes635023200
Columbia UniversityYes 67561930 87178 96 3376 55186246664550 30097 98 5.92130639 99Yes751426138
Davidson CollegeYes 2373 956 45277 96 1601 6172955070600101195 9712.04617581 94Yes668123976
University of North Carolina at Chapel HillNo 145965985333175 92146091100 84004200550120088 93 8.92315893 83Yes595014350
University of VirginiaNo 158495384267874 9511278 114122123792500100090 92 9.52213597 95Yes529217504
Georgetown UniversityYes111152881139071 93 5881 406183007131670170091 92 7.22719635 95Yes950127801
Grove City CollegeYes 24911110 57357 88 2213 35 52243048525 35065 6518.418 4957100Yes3923 9147
Observations:
- 14 of the 16 (87.5%) MSAs (most sought-after institutions) are private.
- 15 (93.75%) have Grad.Rate > 90th percentile with 1 having ≈ 85th percentile with 83% grad rate.
- Outstate tuition for 13 (81.25%) MSAs is among the top 90th percentile.
#'################ workings
Percentile table (single variable)
In [46]:
< 90  : 1 ; 6.25
>= 90 : 15 ;  93.75
A data.frame: 16 × 5
S.NCollegeGrad.RatePercentilePercentile_Main
<int><chr><int><dbl><dbl>
3 1Harvard University 10099.87 99.49
8 2Amherst College 10099.87 98.20
16 3Grove City College 10099.87 93.56
2 4Yale University 9998.58 99.87
4 5Princeton University 9998.58 99.49
11 6Columbia University 9998.58 97.68
6 7Dartmouth College 9897.94 98.97
5 8Brown University 9797.30 98.97
10 9University of Notre Dame 9797.30 97.81
1410University of Virginia 9595.62 96.65
1511Georgetown University 9595.62 96.53
112Massachusetts Institute of Technology 9494.98100.00
1213Davidson College 9494.98 97.55
714University of Pennsylvania 9394.47 98.58
915Wellesley College 9193.05 97.94
1316University of North Carolina at Chapel Hill 8384.81 97.17
Loading required package: SuppDists


 Anderson-Darling k-sample test.

Number of samples:  2
Sample sizes:  777, 16
Number of ties: 712

Mean of  Anderson-Darling  Criterion: 1
Standard deviation of  Anderson-Darling  Criterion: 0.76416

T.AD = ( Anderson-Darling  Criterion - mean)/sigma

Null Hypothesis: All samples come from a common population.

              AD  T.AD  asympt. P-value
version 1: 30.76 38.95        4.644e-17
version 2: 31.50 39.88        2.101e-17
Warning message in ks.test(base_df[, var], subset_df[, var]):
"p-value will be approximate in the presence of ties"
	Two-sample Kolmogorov-Smirnov test

data:  base_df[, var] and subset_df[, var]
D = 0.85513, p-value = 2.206e-10
alternative hypothesis: two-sided
'#####################################################################
Boxplots
In [47]:
'#####################################################################
Statistical test
In [48]:
A data.frame: 19 × 4
KSAD(v1)AD(v2)Significant
<chr><chr><chr><chr>
Apps1e-05 0 0 Y
Accept0.005350.003430.00336Y
Enroll0.000750.000120.00012Y
Top10perc0 0 0 Y
Top25perc0 0 0 Y
F.Undergrad0.001640.000590.00058Y
P.Undergrad0.043040.004430.00463Y
Outstate0 0 0 Y
Room.Board0.000522e-05 2e-05 Y
Books0.193610.039250.02513-
Personal0.910120.484680.42838N
PhD0 0 0 Y
Terminal0 0 0 Y
S.F.Ratio0 0 0 Y
perc.alumni0.000510 0 Y
Expend0 0 0 Y
Grad.Rate0 0 0 Y
TExp.without.O0.001750.000160.00015Y
TExp.with.O0 0 0 Y
'################ workings'
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
d) Top 20 colleges by applications
In [49]:
A data.frame: 10 × 19
AppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.RateTExp.without.OTExp.with.O
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
Min 81 72 35 1 9 139 1 23401780 96 250 8 24 2 0 3186 10 3452 6604
5% 330 272 119 7 26 510 20 46022736 350 500 44 53 8 6 4796 37 4450 9847
10% 458 362 15410 31 641 35 55693051 400 600 51 5910 8 5558 45 480911006
25% 776 604 24215 41 992 95 73203597 470 850 62 711213 6751 53 540013279
50% 1558 1110 43423 54 1707 353 99904200 5001200 75 821421 8377 65 610016079
Mean 3002 2019 78028 56 3700 855104414358 5491341 73 801423 9660 65 624816688
75% 3624 2424 90235 69 4005 967129255050 6001700 85 92163110830 78 695819650
90% 7675 4814190450 8510024 2017165535950 7002200 92 96194014841 89 792223430
95%11066 6979275765 9314478 3304184986382 7662489 95 98214617975 94 839226030
Max48094263306392961003164321836217008124234068001031004064562331181233029095
A data.frame: 20 × 24
PrivateAppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.RateEliteTExp.without.OTExp.with.Oapracrenr
<fct><int><int><int><int><int><int><int><int><int><int><int><int><int><dbl><int><int><int><fct><int><int><dbl><dbl><dbl>
Rutgers at New BrunswickNo 4809426330452036 79214013712 74104748690200990 9519.5191047477No 74471485710645517
Purdue University at West LafayetteNo 2180418744587429 60262134065 95563990570106086 8618.215 860467No 562015176 3718631
Boston UniversityYes2019213007381045 80149713113184206810475102580 8111.9161683672No 831026730 5306429
University of California at BerkeleyNo 19873 8252321595100195322061116486246636193393 9715.8101391978Yes881520463 6184239
Pennsylvania State Univ. Main CampusNo 1931510344345048 93289382025106454060512239477 9618.119 899263No 696617611 5605433
University of Michigan at Ann ArborNo 1915212940489366 92220451339157324659476160090 9811.5261484787Yes673522467 3916838
Michigan State UniversityNo 1811415096618023 57266404120106583734504 60093 9514.0 91052071No 483815496 2938341
Indiana University at BloomingtonNo 1658713243587325 72247632717 97663990600200077 8821.324 868668No 659016356 2828044
University of VirginiaNo 15849 5384267874 9511278 114122123792500100090 92 9.5221359795Yes529217504 5923450
Virginia TechNo 1571211719427729 5318511 604102603176740220085 8913.820 894473No 611616376 3677536
University of California at IrvineNo 156981077524788510012677 864120245302790181896 9616.1111593466Yes791019934 6336923
SUNY at BuffaloNo 15039 9649308736100139633124 65504731708 95790 9713.6151117756No 639612946 4876432
University of Illinois - UrbanaNo 1493911652570552 8825422 911 75604574500198287 9017.413 855981Yes705614616 2627849
University of Wisconsin at MadisonNo 1490110932463136 80239452200 90964290535154593 9611.5201100672No 637015466 3227342
University of Texas at AustinNo 14752 9572532948 85300175189 51303309650314091 9919.711 783765No 709912229 2776556
University of North Carolina at Chapel HillNo 14596 5985333175 92146091100 84004200550120088 93 8.9231589383Yes595014350 4384156
Texas A&M Univ. at College StationNo 1447410519639249 85316432798 51303412600214489 9123.129 847169No 615611286 2267361
SUNY at BinghamtonNo 14463 6166175760 94 8544 671 6550459870010008310018.015 805580Yes629812848 8234328
University of DelawareYes1444610516325222 57141304522102204230530130082 8718.3151065075No 606016280 4447331
University of Massachusetts at AmherstNo 1443812414381612 39162821940 85663897500140088 9216.7151027668No 579714363 3788631
Observations:
- 18 of the top 20 Apps colleges are non-private.
- Institutions with high applications (HAIs) also have high acceptance.
- High applications also accompany high enrollment numbers but the enrollment rates distribution for HAIs is not very different from the overall enrollment rates distribution.
This suggests that although applications are high, not many go on to enroll. Many applications could be backup applications.
- HAIs have a statistically significant distribution (compared to the overall sample) for all variables except Outstate, Room.Board, Books, Personal, S.F. Ratio, perc.alumni.

Note: See workings below.
#'################ workings
Percentile table (single variable)
In [50]:
< 90  : 19 ; 95
>= 90 : 1 ;  5
A data.frame: 20 × 5
S.NCollegeGrad.RatePercentilePercentile_Main
<int><chr><int><dbl><dbl>
9 1University of Virginia 9595.62 98.97
6 2University of Michigan at Ann Arbor 8788.80 99.36
16 3University of North Carolina at Chapel Hill8384.81 98.07
13 4University of Illinois - Urbana 8180.95 98.46
18 5SUNY at Binghamton 8079.67 97.81
4 6University of California at Berkeley 7876.06 99.61
1 7Rutgers at New Brunswick 7774.52100.00
19 8University of Delaware 7571.43 97.68
10 9Virginia Tech 7367.70 98.84
310Boston University 7266.02 99.74
1411University of Wisconsin at Madison 7266.02 98.33
712Michigan State University 7162.93 99.23
1713Texas A&M Univ. at College Station 6959.72 97.94
814Indiana University at Bloomington 6857.66 99.10
2015University of Massachusetts at Amherst 6857.66 97.55
216Purdue University at West Lafayette 6755.34 99.87
1117University of California at Irvine 6652.38 98.71
1518University of Texas at Austin 6550.32 98.20
519Pennsylvania State Univ. Main Campus 6345.05 99.49
1220SUNY at Buffalo 5631.27 98.58

 Anderson-Darling k-sample test.

Number of samples:  2
Sample sizes:  777, 20
Number of ties: 716

Mean of  Anderson-Darling  Criterion: 1
Standard deviation of  Anderson-Darling  Criterion: 0.7632

T.AD = ( Anderson-Darling  Criterion - mean)/sigma

Null Hypothesis: All samples come from a common population.

              AD  T.AD  asympt. P-value
version 1: 3.832 3.710         0.010650
version 2: 3.970 3.887         0.009183
Warning message in ks.test(base_df[, var], subset_df[, var]):
"p-value will be approximate in the presence of ties"
	Two-sample Kolmogorov-Smirnov test

data:  base_df[, var] and subset_df[, var]
D = 0.37619, p-value = 0.008022
alternative hypothesis: two-sided
#'################ workings
Enroll v Enroll rate
In [51]:
In [52]:
In [53]:
In [54]:
#'#####################################################################
Boxplots
In [55]:
#'#####################################################################
Statistical test
In [56]:
A data.frame: 19 × 4
KSAD(v1)AD(v2)Significant
<chr><chr><chr><chr>
Apps0 0 0 Y
Accept0 0 0 Y
Enroll0 0 0 Y
Top10perc7e-04 1e-05 1e-05 Y
Top25perc2e-05 0 0 Y
F.Undergrad0 0 0 Y
P.Undergrad0 0 0 Y
Outstate0.450510.549 0.5532 N
Room.Board0.613060.486870.48912N
Books0.162980.027560.03603-
Personal0.118530.037380.03615-
PhD3e-05 0 0 Y
Terminal5e-05 1e-05 1e-05 Y
S.F.Ratio0.061450.047010.04625-
perc.alumni0.052650.020490.02054-
Expend0.002570.002280.00223Y
Grad.Rate0.008020.010650.00918Y
TExp.without.O0.173160.175070.17272N
TExp.with.O0.555010.469430.4716 N
#'################ workings'
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
e) Further analysis of Elite colleges
In [57]:
  1. 78
  2. 21
A data.frame: 10 × 19
AppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.RateTExp.without.OTExp.with.O
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
Min 81.00 72.0 35.00 1.00 9.0 139.00 1.0 2340.001780.00 96.00 250.00 8.00 24.0 2.50 0.00 3186.00 10.00 3452.00 6604.00
5% 329.80 272.4 118.60 7.00 25.8 509.80 20.0 4601.602735.80 350.00 500.00 43.80 52.8 8.30 6.00 4795.80 37.00 4450.00 9846.60
10% 457.60 361.6 154.0010.00 30.6 641.00 35.0 5568.803051.20 400.00 600.00 50.60 59.0 9.90 8.00 5558.20 44.60 4809.2011006.00
25% 776.00 604.0 242.0015.00 41.0 992.00 95.0 7320.003597.00 470.00 850.00 62.00 71.011.5013.00 6751.00 53.00 5400.0013279.00
50% 1558.00 1110.0 434.0023.00 54.0 1707.00 353.0 9990.004200.00 500.001200.00 75.00 82.013.6021.00 8377.00 65.00 6100.0016079.00
Mean 3001.64 2018.8 779.9727.56 55.8 3699.91 855.310440.674357.53 549.381340.64 72.66 79.714.0922.74 9660.17 65.46 6247.5516688.22
75% 3624.00 2424.0 902.0035.00 69.0 4005.00 967.012925.005050.00 600.001700.00 85.00 92.016.5031.0010830.00 78.00 6958.0019650.00
90% 7675.00 4814.21903.6050.40 85.010024.40 2016.616552.805950.00 700.002200.00 92.00 96.019.2040.0014841.00 89.00 7922.4023430.00
95%11066.20 6979.22757.0065.20 93.014477.80 3303.618498.006382.00 765.602488.80 95.00 98.021.0046.0017974.80 94.20 8392.0026030.00
Max48094.0026330.06392.0096.00100.031643.0021836.021700.008124.002340.006800.00103.00100.039.8064.0056233.00118.0012330.0029095.00
A data.frame: 20 × 21
PrivateAppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.RateEliteTExp.without.OTExp.with.O
<fct><int><int><int><int><int><int><int><int><int><int><int><int><int><dbl><int><int><int><fct><int><int>
Massachusetts Institute of TechnologyYes 6411 2140107896 99 4481 282010059757251600 99 9910.13533541 94Yes830028400
Harvey Mudd CollegeYes 1377 572 17895100 654 5172306690700 900100100 8.24621569100Yes829025520
University of California at BerkeleyNo 19873 82523215951001953220611164862466361933 93 9715.81013919 78Yes881520463
Yale UniversityYes10705 2453131795 99 5217 831984065106302115 96 96 5.84940386 99Yes925529095
Harvard UniversityYes13865 2165160690100 6862 3201848564105001920 97 97 9.95237219100Yes883027315
Duke UniversityYes13789 3893158390 98 6188 531859059506251162 95 96 5.04427206 97Yes773726327
Princeton UniversityYes13218 2042115390 98 4540 1461990059106751575 91 96 8.45428320 99Yes816028060
Georgia Institute of TechnologyNo 7837 4527227689 99 8528 654 648944387951164 92 9219.33311271 70Yes639712886
Dartmouth CollegeYes 8587 2273108787 99 3918 321954560705501100 95 99 4.74929619 98Yes772027265
Brown UniversityYes12586 3239146287 95 5643 3491952859267201100 99100 7.63920440 97Yes774627274
Pepperdine UniversityYes 3821 2037 68086 96 2488 625182006770500 700 95 9811.61316185 66Yes797026170
University of California at IrvineNo 156981077524788510012677 8641202453027901818 96 9616.11115934 66Yes791019934
University of PennsylvaniaYes12394 5232246485100 9205 5311702072705001544 95 96 6.33825765 93Yes931426334
Northwestern UniversityYes12289 5200190285 98 7450 451640455207591585 96100 6.82526385 92Yes786424268
Amherst CollegeYes 4302 992 41883 96 1593 51976053006601598 93 98 8.46321424100Yes755827318
Williams CollegeYes 4186 1245 52681 96 1988 291962957905001200 94 99 9.06422014 99Yes749027119
Wellesley CollegeYes 2895 1249 57980 96 2195 156183455995500 700 94 9810.65121409 91Yes719525540
University of Notre DameYes 7700 3700190679 96 7671 301685044006001350 96 9213.14613936 97Yes635023200
Columbia UniversityYes 6756 1930 87178 96 3376 55186246664550 300 97 98 5.92130639 99Yes751426138
Davidson CollegeYes 2373 956 45277 96 1601 61729550706001011 95 9712.04617581 94Yes668123976
'Elite' : College/Universities that have >50% proportion of Top10perc students.
'Top10perc' : % of new students from top 10% of their High School class

Observations:
- There are 78 (10% of 777) 'Elite' institutions.
- The distribution of every variable is different in 'Elite' colleges when compared with the variable's overall distribution, except in the case of 'Books' and 'Personal'.

Top 20 Elite:
- Unsurprisingly, the top 'Elite' institutions also have the highest proportion of students that graduated in the top 25% of their high schools. All 20 are among the top 97th percentile of Top25perc.
- Min 'Phd' and 'Terminal' proportions are 91% and 92% respectively.
- 19 out of 20 institutions have faculty with 'PhD's within the top 90th percentile.
- 18 out of 20 institutions have faculty with 'Terminal' degrees within the top 90th percentile.
- 16 of the 20 have out-of-state tuition among the top 90th percentile, with California-Irvine and California-Berkely being the notable outliers with 69th and 65th percentile respectively, and Georgia Institute of Technology being an extreme outlier with 15.7th percentile.
- 70% of the Top 20 Elite have Room.Board expenses among the top 85th percentile.
- Student-faculty ratio is generally lower than overall, with 14 of the top 20 having S.F.Ratio below the 25th percentile.
  Here again California-Irvine, California-Berkely and Georgetown Institute of Technology stand out with 73th, 71st and 91st percentiles respectively.
- 15 of the 20 are among the top 80th percentile in terms of proportion of alumni that donate (perc.alumni).
- Graduation rates are higher than the norm among the Elite institutions with 16 of the top 20 'Elite' having Grad.Rates above the 93rd percentile.
#'################ workings
Percentile table (single variable)
In [58]:
'Top 20 elite colleges'
< 90  : 4 ; 20
>= 90 : 16 ;  80
A data.frame: 20 × 5
S.NCollegeGrad.RatePercentilePercentile_Main
<int><chr><int><dbl><dbl>
2 1Harvey Mudd College 10099.87 99.87
5 2Harvard University 10099.87 99.49
15 3Amherst College 10099.87 98.20
4 4Yale University 9998.58 99.87
7 5Princeton University 9998.58 99.49
16 6Williams College 9998.58 98.07
19 7Columbia University 9998.58 97.68
9 8Dartmouth College 9897.94 98.97
6 9Duke University 9797.30 99.49
1010Brown University 9797.30 98.97
1811University of Notre Dame 9797.30 97.81
112Massachusetts Institute of Technology 9494.98100.00
2013Davidson College 9494.98 97.55
1314University of Pennsylvania 9394.47 98.58
1415Northwestern University 9293.69 98.58
1716Wellesley College 9193.05 97.94
317University of California at Berkeley 7876.06 99.87
818Georgia Institute of Technology 7061.39 99.10
1119Pepperdine University 6652.38 98.71
1220University of California at Irvine 6652.38 98.58

 Anderson-Darling k-sample test.

Number of samples:  2
Sample sizes:  777, 20
Number of ties: 716

Mean of  Anderson-Darling  Criterion: 1
Standard deviation of  Anderson-Darling  Criterion: 0.7632

T.AD = ( Anderson-Darling  Criterion - mean)/sigma

Null Hypothesis: All samples come from a common population.

             AD  T.AD  asympt. P-value
version 1: 26.8 33.80        6.633e-15
version 2: 27.4 34.63        3.249e-15
Warning message in ks.test(base_df[, var], subset_df[, var]):
"p-value will be approximate in the presence of ties"
	Two-sample Kolmogorov-Smirnov test

data:  base_df[, var] and subset_df[, var]
D = 0.71763, p-value = 3.794e-09
alternative hypothesis: two-sided
#'#####################################################################
Boxplots
In [59]:
#'#####################################################################
Statistical test
In [60]:
A data.frame: 19 × 4
KSAD(v1)AD(v2)Significant
<chr><chr><chr><chr>
Apps0 0 0 Y
Accept0.001260.001180.00115Y
Enroll0.000827e-05 7e-05 Y
Top10perc0 0 0 Y
Top25perc0 0 0 Y
F.Undergrad0.001886e-04 0.00058Y
P.Undergrad0.025820.002860.00304Y
Outstate0 0 0 Y
Room.Board0 0 0 Y
Books0.014930.000580.00047Y
Personal0.500210.4499 0.47569N
PhD0 0 0 Y
Terminal0 0 0 Y
S.F.Ratio3e-05 0 0 Y
perc.alumni3e-05 0 0 Y
Expend0 0 0 Y
Grad.Rate0 0 0 Y
TExp.without.O0 0 0 Y
TExp.with.O0 0 0 Y
#'################ workings'
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Short description of variables

Statistics for a large number of US Colleges from the 1995 issue of US News and World Report.
Return to Index

Private : Public/private indicator
Apps : Number of applications received
Accept : Number of applicants accepted
Enroll : Number of new students enrolled
Top10perc : New students from top 10 % of high school class
Top25perc : New students from top 25 % of high school class
F.Undergrad : Number of full-time undergraduates
P.Undergrad : Number of part-time undergraduates
Outstate : Out-of-state tuition
Room.Board : Room and board costs
Books : Estimated book costs
Personal : Estimated personal spending
PhD : Percent of faculty with Ph.D.’s
Terminal : Percent of faculty with terminal degree
S.F.Ratio : Student/faculty ratio
perc.alumni : Percent of alumni who donate
Expend : Instructional expenditure per student
Grad.Rate : Graduation rate